ParaConc: Concordance Software for Multilingual Parallel Corpora
نویسنده
چکیده
Parallel concordance software provides a general purpose tool that permits a wide range of investigations of translated texts, from the analysis of bilingual terminology and phraseology to the study of alternative translations of a single text. This paper outlines the main features of a Windows concordancer, ParaConc, focussing on alignment of parallel (translated) texts, general search procedures, identification of translation equivalents, and the furnishing of basic frequency information. ParaConc accepts up to four parallel texts, which might be four different languages or an original text plus three different translations. A semi-automatic alignment utility is included in the program to prepare texts that are not already pre-aligned. Simple text searches for words or phrases can be performed and the resulting concordance lines can be sorted according to the alphabetical order of the words surrounding the searchword. More complex searches are also possible, including context searches, searches based on regular expressions, and word/part-of-speech searches (assuming that the corpus is tagged for POS). Corpus frequency and collocate frequency information can be obtained. The program includes features for highlighting potential translations, including an automatic component “Hot words,” which uses frequency information to provide information about possible translations of the searchword.
منابع مشابه
ParaConc: Concordance software for multilingual parallel corpora
Parallel concordance software provides a general purpose tool that permits a wide range of investigations of translated texts, from the analysis of bilingual terminology and phraseology to the study of alternative translations of a single text. The software is not tied to particular languages and so can be used with English-Chinese texts, French-Italian texts, and so on. This paper describes th...
متن کاملA Corpus - Based Study of Restrictive Relative Clauses
This paper aims to investigate the similarities & differences of Restrictive Relative Clauses (RRC) among 3 languages by comparing & contrasting parallel data extracted from a POS-tagged multilingual corpus. This research further provides examples for corpus-based language analysis & application of SLA. This investigation consists of three major works. First, we construct a POS-tagged multiling...
متن کاملYaMTG: An Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora
This paper describes YaMTG (Yet another Multilingual Translation Graph), a new open-source heavily multilingual translation database (over 664 languages represented) built using several sources, namely various wiktionaries and the OPUS parallel corpora (Tiedemann, 2009). We detail the translation extraction process for 21 wiktionary language editions, and provide an evaluation of the translatio...
متن کاملAn Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora
This paper describes YaMTG (Yet another Multilingual Translation Graph), a new open-source heavily multilingual translation database (over 664 languages represented) built using several sources, namely various wiktionaries and the OPUS parallel corpora (Tiedemann, 2009). We detail the translation extraction process for 21 wiktionary language editions, and provide an evaluation of the translatio...
متن کاملParallel Corpora, Alignment Technologies and Further Prospects in Multilingual Resources and Technology Infrastructure
Multilingual technologies, which to a large extent are language independent, provide a powerful support for easier building of annotated linguistic resources for languages where such resources are scarce or missing. All these technologies require parallel corpora in order to achieve their ends. Parallel texts encode extremely valuable linguistic knowledge because the linguistic decisions made b...
متن کامل